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Abstract — The multiple measurement vector (MMV) problem 
addresses the identification of unknown input vectors that share 
common sparse support. The MMV problem has been tradition- 
ally addressed either by sensor array signal processing or com- 
pressive sensing. However, recent breakthroughs in this area such 
as compressive MUSIC (CS-MUSIC) or subspace-augumented 
MUSIC (SA-MUSIC) optimally combine the compressive sensing 
(CS) and array signal processing such that k—r supports are first 
found by CS and the remaining r supports are determined by a 
generalized MUSIC criterion, where k and r denote the sparsity 
and the number of independent snapshots, respectively. Even 
though such a hybrid approach significantly outperforms the 
conventional algorithms, its performance heavily depends on the 
correct identification of k — r partial support by the compressive 
sensing step, which often deteriorates the overall performance. 
The main contribution of this paper is, therefore, to show that 
as long as k — r + 1 correct supports are included in any k- 
sparse CS solution, the optimal k — r partial support can be 
found using a subspace fitting criterion, significantly improving 
the overall performance of CS-MUSIC. Furthermore, unlike the 
single measurement CS counterpart that requires infinite SNR 
for a perfect support recovery, we can derive an information 
theoretic sufficient condition for the perfect recovery using CS- 
MUSIC under a finite SNR scenario. 

I. Introduction 

One of important areas of compressed sensing research is 
the so-called multiple measurement vector problem (MMV) 
[1-3,5-9]. The MMV problem addresses the recovery of a set 
of sparse signal vectors that share common non-zero support. 
In MMV, thanks to the common sparse support, it is quite 
predictable that the number of recoverable sparsity levels may 
increase with the increasing number of measurement vectors. 
However, the performance of the existing MMV compressive 
sensing algorithms are not generally satisfactory even for a 
noiseless case when a finite number of snapshots are available. 

A recent breakthrough in this area has created a new 
class of algorithms such as compressive MUSIC (CS-MUSIC) 
proposed by our group [5] or subspace-augumented MUSIC 
(SA-MUSIC) proposed independently [8]. Specifically, when 
the number of targets is k, and r independent snapshots 
are available, compressive MUSIC finds k — r targets us- 
ing a compressive sensing algorithm such as S-OMP or p- 
thresholding, and the remaining r targets are recovered us- 
ing a generalized MUSIC criterion [5]. This hybridization 
significantly improves the performance of estimating jointly 
sparse signals and achieves the sparse recovery bound 
using a finite number of snapshots. Furthermore, even if the 
sparsity level is not known a priori, compressive MUSIC can 



accurately estimate the sparsity level using the generalized 
MUSIC criterion. In spite of its success, one of the main 
shortcomings of CS-MUSIC or SA-MUSIC is that the overall 
performance is heavily dependent upon the success of the first 
k — r support estimation. This is especially problematic when 
the measurement is so noisy or the RIP condition for the 
sensing matrix is so bad that the greedy k — r update step 
may produce incorrect support estimate. 

One of the main contributions of this paper is, therefore, 
to relax this stringent requirement. In particular, the new 
algorithm requires that k — r + 1 supports (not in sequential 
order) out of k support estimation is correct rather than k — r 
consecutive support estimate are correct. The location of the 
unknown k — r true support can be then readily estimated 
using a subspace fitting criterion. Such optimized partial 
support estimates can significantly improve the accuracy of 
the generalized MUSIC step, hence overall performance of 
compressive MUSIC. 

The paradigm shift from early termination of CS algorithm 
after k — r step to selecting the correct k — r supports out 
of fc-sparse solution by any CS algorithm is much more 
significant and fundamental than just regarded as algorithmic 
improvement. In particular, by converting the problem as a par- 
tial support recovery problem, we can adapt rich information 
theoretical analysis tools that have been developed for single 
measurement vector CS (SMV-CS) [12] . In particular, we 
can derive an information theoretic sufficient condition for the 
perfect recovery of CS-MUSIC under a finite SNR scenario, 
which was considered not feasible in the SMV-CS [12] . 

II. Problem Formulation and Mathematical 
Preliminaries 

Throughout the paper, x* and Xj correspond to the z-th row 
and the j-th column of matrix X, respectively. When S is 
an index set, X s , As corresponds to a submatrix collecting 
corresponding rows of X and columns of A, respectively. The 
following canonical MMV formulation is very useful for our 
analysis. 

Definition 2.1 (Canonical form MMV [5]): Let m, n and r 
be a positive integers (m < n) that represents the number 
of sensor elements, the ambient space dimension, and the 
number of snapshots, respectively. Suppose that we are given a 
multiple-measurement vector B e R mxr , X = [xi, • • • , x r ] E 
M. nxr , and a sensing matrix A £ M. mxn . A canonical form 



MMV problem is given by the following optimization prob- 
lem: 



j e {!,-■■ ,n}\ I k - 



mimrmzc 



subject to B 



AX, 



(1) 



where ||X|| = |suppX|, suppX = {1 < i < n : x l ^ 0}, x* 
is the i-th row of X, and the measurement matrix B is full 
rank, i.e. rank(B) = r < \\X\\ . 

Note that the canonical form MMV has the additional con- 
straint that rank(£>) = r < ||X||o. This is not problematic at 
all since every MMV problem can be converted into canonical 
form using the singular value decomposition [5]. Now, the 
following theorem provides the Iq sparse recovery bound from 
noiseless measurements. 

Theorem 2.1 (Iq Bound): [1-3] Let spark(A) denote the 
smallest number of linearly dependent columns of A. Then, 
1 e K" xr is the unique solution of AX = B if and only if 



l*llo< 



spark(A) + rank(B) — 1 



< spark(A) - 1 . (2) 



III. Compressive MUSIC 



Consider a canonical form MMV problem. Suppose, fur- 
thermore, that the columns of a sensing matrix A e R mx ™ 
are in general position; that is, any collection of m columns 
of A are linearly independent. Then, according to [3, 13], for 
any j £ {1, • • • , n}, j € suppX if and only if 

Q*a, = 0, (3) 

where Q <G R mx ( m - r ) consists of orthonormal columns such 
that Q*B = so that = R(B), which is often called 

"noise subspace". 

Note that the MUSIC criterion (3) holds for all m > k + 1 
if the columns of A are in general position. Using the com- 
pressive sensing terminology, this implies that the recoverable 
sparsity level by MUSIC (with a probability 1 for the noiseless 
measurement case) is given by 



\X\\ < m — spark(A) — 1, 



(4) 



where the last equality comes from the definition of the spark. 
Therefore, the lo bound (2) can be achieved by MUSIC when 
r = k. However, for any r < k, the MUSIC condition (3) 
does not hold. This is a major drawback of MUSIC compared 
to the compressive sensing algorithms that allows perfect 
reconstruction with extremely large probability by increasing 
the sensor elements m. This drawback of the conventional 
MUSIC can be overcome by the following generalized MUSIC 
criterion [5]. 

Theorem 3.1: [5] Assume that A e R mxn , X e M" xr , 
and B e M mxr satisfy AX = B. Furthermore, we assume 
that ||X|| = k and A satisfies the RIP condition with the left 
RIP constant (Jjfc-r+i < 1- If we given Ik~ r C suppX 
with |7 fe _ r | = k - r and A Ik _ r E R™x (*-»•), which con- 
sists of columns whose indices are in Ik- r , then for any 



P, 



R(Q) 



P R(P R (Q)A, 



a j= 



(5) 



if and only if j e suppX. 

Note that when r — k, the condition (5) is the same as 
the MUSIC criterion (3). By Theorem 3.1, we can develop 
the Compressive MUSIC algorithm, which can be executed 
by these processes. 

• Step 1: Find k — r indices of suppX by any MMV 
compressive sensing algorithms such as 2-thresholding 
or SOMP. 

• Step 2: Let Jfe-r be the set of indices which are taken in 
Step 1 and S = Ik-r- 

• Step 3: For j e {1, • • • , n}\Ik- r , calculate the quantities 
r]{j) = a*[P R{Q) - P R (p R{Q) A Ik _ r )]* 3 for all j £ I k _ r . 

• Step 4: Make an ascending ordering of j ^ 7fe_ r 
and choose indices that correspond to the first r elements 
and put these indices into S. 

In compressive MUSIC, we determine k — r indices of 
suppX with CS-based algorithms such as 2-thresholding or 
S-OMP, where the exact identification of k — r indices is a 
probabilistic matter. After that process, we recover remaining 
r indices of suppX with a generalized MUSIC criterion, 
which is given in Theorem 3.1, and this reconstruction process 
is deterministic. This hybridization makes the compressive 
MUSIC applicable for all ranges of r, outperforming all the 
existing methods. 

So far, we introduced the compressive MUSIC algorithm. 
To analyze the performance of the compressive MUSIC, we 
find the number of measurements with which we can identify 
the support of X by using compressive MUSIC with S-OMP. 
For this purpose, we consider the large system limit so that 
we assume the following conditions. 

• Let p := linin^oo m(n)/n exist. Then we call p as the 
asymptotic under-sampling rate. 

• Let e := lim^oo k(n)/n exist. Then we call e as the 
asymptotic sparsity. 



• Let a 

lim„ 



linir, 



,r(n)/k(n) exist and 7 := 

y/k/ra. 

Now, we may consider two cases according to the num- 
ber of multiple measurement vectors. First, we consider the 
case when the number of multiple measurement vectors are 
finite fixed number. Conventional compressive sensing (SMV 
problem) is a kind of this case. Second, we consider the 
case when r is proportional to n. This case includes the 
conventional MUSIC case. To analyze S-OMP, we assume 
that each element of A is i.i.d. Gaussian random variable 
Af(0, 1/m). In analyzing S-OMP, rather than analyzing the 
distribution of \\a*P^ Ai ^B\\ 2 F where I t denotes the set of 
indices which are chosen in the first t step of S-OMP, we 
consider the following version of subspace S-OMP due to its 
better performance [2, 8]. 

• Step 1 : Initialize t = and I = 0. 

• Step 2 : Compute P^/^ j which is the projection op- 
erator onto the orthogonal complement of the span of 



{a, : j G It}- 

• Step 3 : Compute P^ Al )B an d f° r all j — 1 , - • , tt,, 
compute p(t,j) = \\a*P R{ p^ A ^ )B) \\ 2 F . 

• Step 4 : Take j t = argmaxj = i ; ... )fl p(t,j) and 7 t+ i = 
7* U {ji} and if t < k — r return to Step 2. 

• Step 5 : The final estimate of the k—r elements of support 

is 4_ r . 

Theorem 3.2: Assume that we have multiple measurements 
B = AX where each element of A is generated from i.i.d. 
A/"(0, 1/to) and N is an additive noise. Then, in the large 
system limit, with probability 1, we can identify k—r elements 
of the support of X with subspace S-OMP if we have one of 
the following conditions : 

1 . r is a fixed finite number and 

, ,„ „2 log (n — k) 

m > k(l + 5) ^ '- 

r 

for some 5 > 0. 

2. r satisfies lim„^oo (log n)/r = 0, lim„^co r/k = a and 

m > k(l + 8f [2 - F(a)f 
for some 5 > where 

I /-4ti(a) 2 

F(a) = - / xdXUx), 
a Jo 

dXi(x) = (yj (4 — x)x)/(2irx) is the probability mea- 
sure with support [0,4], < t\(a) < 1 satisfies 

f 4ti(a) 2 7 \ ( \ 

Jo dX 1 (x)=a. 
Here, F(a) is an increasing function on (0, 1] such that 
F(l) = 1 and lim Q ^ + F(a) = 0. 

Proof: See Appendix A. ■ 

By above theorem, the number of measurements for S-OMP 
shows some different characteristics according to the number 
of the measurement vectors. First, if we have small number of 
multiple measurement vectors, then the number of samples for 
S-OMP is reciprocally proportional to the number of multiple 
measurement vectors. On the other hand, we have sufficiently 
large number of snapshots such that lim^oo (log n) jr is close 
to 0, then the number of measurements for S-OMP varies from 
Ak to k according to the ratio of r and k so that the log n is not 
necessary. In particular, if the number of snapshots approaches 
the sparsity k, then we can identify the indices of suppX with 
only (1 + S)k where 5 is any small positive number, which 
is equivalent to the required number of multiple measurement 
vectors for the success of conventional MUSIC algorithm. 

Furthermore, in [5], we developed the analysis for the noisy 
setting, where we showed that the required SNR for the 
success of support recovery decreases when the asymptotic 
ratio of the number of snapshots and the sparsity level (that 
is, lim n _ ) . (X) r/k) increases, in the large system limit. This is 
one of the important advantages of MMV over SMV. 

IV. Optimized partial support selection 

As discussed before, we can easily expect that the perfor- 
mance of the compressive MUSIC is very dependent on the 



selection of k — r correct indices of the support of X. Note that 
this is a very stringent condition. In practice, even though the 
consecutive k — r steps of S-OMP may not be correct, there are 
chances that among the fc-sparse solution of S-OMP, part of the 
supports can be correct. Hence, if the estimate of the support 
of X has at least k — r indices of the support of X and we can 
identify them, then we can expect that the performance of the 
compressive MUSIC will be improved. When ( fc _ ) is small, 
we may apply the exhaustive search, but if both k — r and 
r are not small, then the exhaustive search is hard to apply 
so that we have to find some alternative method to identify 
the correct indices from the estimate of suppX. Indeed, the 
following subspace fitting criterion can address the problem. 

Theorem 4.1: Assume that we have a canonical MMV 
model AX = B where A e M mx ", X e M" xr , ||X|| = k 
and r < k < m < n. If there is an index set Ik C {1, • • • , n} 
such that |7fe| = min{fc, spark(A) — r} and \Ik fl suppX| > 
k — r + 1, then for any j e Ik, j € suppX if and only if 

^Q fc , 3 a,=0, (6) 

where Qkj is the orthogonal complement for R([B A Ik \^y]), 
A Ik \{jy consists of columns of A whose index belongs to 
Ik \ {j} and Pft([ B Al . ]) is the orthogonal projection on 
R([B A IkX{j} })±. 

Proof: Assume that j e 4 (1 suppX. Then |(7fc \ {j}) n 
suppX | > k — r so that 

R([B A IkXU} }) D R([B Aj k _ r ])nR(A s ) = R(A s ) 

where Jj^-r C (Ik \ {j}) H S, |Jj ; fc_ r | = k — r and S = 
suppX. Since aj € R(As), (6) holds for j E Ik d suppX. 

To show the converse, assume that (6) holds for some j e 
Ik. Then we have aj G R{[B A[ k \ {j}}), that is, there some 
pefandqe M^l" 1 such that 

a 3 =Bp + A Ik \ {j} q = AXp + A /(Ab} q. 

Since |(suppX)U/ fc | < fc+|7 fe |-(fc-r+l) < fc+spark(A)- 
r — (k — r + 1) = spark(A) — 1, if j £ suppA", then there is 
an r e M™\{0} such that ||r|| < spark(A) and Ay = since 
j ^ suppX U (Ik \ {j})- Then, by the definition of spark(A), 
that is a contradiction so that j € suppX if (6) holds. ■ 
In particular, if the columns of A are in general position, 
then we can take index set Ik with |7fc| = min{fc, m — r + 1}. 
Also, if A has an RIP condition with 8 2 k < 1, then we can 
take |7fe| = k since r < k. Then, Theorem 4.1 informs us that 
we only require the partial support recovery rather than k — r 
consecutive correct CS step [5]. Accordingly, the compressive 
MUSIC with optimized partial support is then performed by 
following procedure. 

. step 1 : Let S = 0. 

- If r < k, estimate k indices of suppAT by MMV 
compressive sensing algorithm. 

- If r = k, goto step 5. 

• step 2 : Let Ik be the set of indices which are taken in 
step 1. 



• step 3 : For j £ Ik, calculate the quantities ((j) — 

WPq^W 2 - ' 

• step 4 : Make an ascending ordering of ((j), j £ Ik and 
choose indices that corresponds the first k — r elements 
and put these indices into S. 

• step 5 : For j £ {1, • • • ,n}\ S, calculate the quantities 

v(j)=S*jPG Ik S r 

■* 1 k — r 

• step 6 : Make an asending ordering of r/(j), j ^ S and 
choose indices that correspond to the first r elements and 
put these indices into S. 

In above algorithm, we require partial correctness of support 
estimation instead of exactness of k — r consecutive support 
estimation. Moreover, the step 1 in above algorithm need not 
to be greedy so that we can also apply the convex optimization 
algorithm such as I2.1 minimization [9] or belief propagation 
[7]. 

So far, we have assumed that the measurement B is without 
noise. For the case of noisy measurement, B is corrupted so 
that the optimized partial support selection is affected by noise. 
Although we do not discuss the noise sensitivity in this paper, 
this issue will be investigated in the future works. 

V. Information theoretic analysis for partial 

SUPPORT RECOVERY FOR MMV 

From above section, we know that compressive MUSIC 
with optimized partial support can bear with the fractional 
distortion of support estimate error less than a to guarantee 
the exact recovery in the large system limit. Therefore, in this 
section, we are interested in finding a sufficient condition such 
that we can find the estimate for the support with fractional 
distortion less than a in an MMV step. Here, we consider the 
linear model in which the multiple measurement Y £ R mxr 
is given as 

Y = AX + N 

where A £ R mx ™ j s a sensing matrix and N £ R mxr is 
additive noise whose columns are i.i.d. and have the distri- 
bution M (0,(7^1). Also we assume that X has k nonzero 
rows which are indexed by the set S and that £ is distributed 
uniformly over the (^) possibilities. Again, we assume that 
the distributions of each column of X are identical and 
independent. Furthermore, we assume that the elements of 
sensing matrix A are randomly given with i.i.d. A/"(0, 1/n). 
Here we consider the large system limit. Also, we use the 
following definition for SNR. 

Definition 5.1: For a given multiple signal X, the SNR is 
given by 

sm { x)= mAx ^ iml 



E[\\N 



Also, for a stochastic signal class X, SNR(X) is called an 
asymptotic lower bound on SNR(A") if there exists a constant 
c > such that 

P{SNR(A-(n)) < SNR(X)} > 1 - e - nc . 

The analysis for partial support recovery use an information 
theoretic approach which was used in [12] so that we define 



the following function. 

Definition 5.2: For p £ [0, 1] and u e [0, 1 — e], we define 



h(e, a) — eh(a) + (1 — e)h 



where h(p) = —plogp — (1 — p)log(l — p) is the binary 
entropy function. 

For a fractional distortion a > 0, we define the fractional 
partial recovery with distortion rate a by the requirement 
d(S,S)/k < a where S is the estimate for the support of 
X such that \S\ = k and d(S, S) = \S \ S\. If a > 1 - e, the 
random guessing estimator Krg is asymptotically reliable so 
that we assume that a < 1 — e [12]. 

For the analysis, we consider the maximum likelihood (ML) 
estimator which is given by 

5 Mi (y)=argmin \\P^ (Au) Y\\% 



\U\=k 



where P^ Au -j is the projection operator onto the orthogonal 
complement of R(Au). For fc-sparse multiple input signal 
X G M. nxr , we introduce the following term. 

Definition 5.3: Let Z correspond to the nonzero rows of X 
and satisfy Hz 1 1| 2 < ||z 2 || 2 < ■■■ < ||z fe || 2 . Then, for some 
a < 1, we let 



g(a,X) 



a X\ 



[ak] 

E 



F ,- = i 



Also, for a stochastic signal class X, let g(a, X) be the 
asymptotic lower bound on g(a, X) if there is a constant c > 
such that 

P{g(a, X(n)) < g(a, X)} > 1 - e~" c . 

In [12], Reeves and Gastpar gave sufficient conditions for 
partial support recovery for SMV problem using ML estimator. 
We can extend those results to the MMV problem as the 
following theorem. 

Theorem 5.1: For a given signal class X, sparsity e e 
(0, 1), undersampling ratio p < 1, the fractional distortion 
a £ (0, 1 — e), the estimator Sml is asymptotically reliable if 

SNR(A-) > — 1 — (7) 



and 



1 

p > e H — max 



ag(a, X) 
2h(e,u) 



(8) 



r ue[a,i-e] log (j(u, X)) + 7(u, X)- 1 - 1 

where j(u, X) = SNR(X)ug(u, X). 

Proof: See Appendix B. ■ 

Note that if a > 1 — e, the random guessing estimator 
is asymptotically reliable so that we can identify the support 
with distortion less than a with large probability in the large 
system case, by augmenting randomly chosen k — r support in 
generalized MUSIC step. Moreover, in this case, the sufficient 
condition becomes p > e, which is equivalent to the MUSIC 
for the full rank measurement. 



In addition, in [12], Reeves and Gastpar gave necessary 
conditions for partial support recovery for SMV problem. The 
counterpart for MMV can be given by the following theorem. 
For the proof, see Appendix C. 

Theorem 5.2: For a given stochastic signal class X, sparsity 
e E (0, 1), sampling rate p < 1 and fractional distortion a E 
(0, 1 — a), a necessary condition for asymptotically reliable 
recovery is 



P > 



h{e) - h{e,a) + I{X;Y\S)/n 



where I(X;Y\K) is the mutual information between X and 
Y conditioned on S, and ki (X) is the asymptotic upper bound 
for the l-th largest eigenvalue of X*X, where X E X. 

VI. Numerical Simulation 

We compared the performance of compressive MUSIC 
with optimized partial support (proposed algorithm), compres- 
sive MUSIC (CS-MUSIC), subspace-augmented MUSIC (SA- 
MUSIC) and S-OMP. We used S-OMP as a MMV compressive 
sensing algorithm for various hybrid MMV algorithms. In 
order to quantify the performance of each algorithms, the 
empirical recovery ratio is calculated which is defined as the 
percentage of correct identification of all supports, and the 
ratio are averaged for 5000 simulation results. The simulation 
parameters are as following: m = 40, n = 100, the number of 
measurement vectors is r = 9, and k — 1, 2, • • • , 20, respec- 
tively. Each component of the sensing matrix A is generated 
by i.i.d. Gaussian random variable — =^(0, 1) or -^=J\f(l, 1) 
to see the effect of RIP in each algorithms. Gaussian noise 
of SNR = 40gLB is added to the measurement vector B. In 
Figure 1, we can observe that the proposed method shows 
significantly better performance than the original version of 
compressive MUSIC, SA-MUSIC and S-OMP. In particular, 
the proposed method is more robust to bad RIP of the sensing 
matrices such that the performance gain is more prominent. 

VII. Conclusion 

This paper proposed a mathematical framework for opti- 
mized partial support selection to improve the performance 
of compressive MUSIC for joint sparse recovery. We first 
discussed about the original compressive MUSIC algorithm, 
and derived the sharp bound for the number of measurement 
for exact recovery using subspace S-OMP for partial support 
recovery. Then, we discussed that the requirement of the 
correct k — r step S-OMP can be relaxed such that as long as 
k—r+1 supports from fc-support estimate are correct, subspace 
fitting criterion can identify the correct k — r support to 
improve the robustness of the compressive MUSIC algorithm. 
Information theoretical analysis was also provided to obtain 
a sufficient condition for MMV joint sparse recovery using 
compressive MUSIC algorithm. As a future work, we will 
derive the SNR condition for the success of subspace fitting 
step. 
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Fig. 1. Recovery rates when m = 40, r = 9, SNR = AOdB and A is 
generated from (a) -±=Af(0, 1) and (b) -±=Af(l, 1). 
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Appendix A: Number of measurements for 
Compressive MUSIC with subspace S-OMP 

In this section, we assume the large system limit so that 
we will assume that p, e, a and 7 exist. In this section, we 
will use the following theorem, which gives us the asymptotic 
distribution of singular values for Gaussian random matrices. 

Theorem 7.1: [10] Suppose that each entry of A e R mxk 
is generated from i.i.d. Gaussian random variable j\f(0, l/m). 
Then the probability density of squared singular value of A is 
given by 



where the inequality comes from Lemma 7.2. Substituting (13) 
into (10), we have 



lim 

n— too 



a 



3=1 

(l-7) 2 +7 : 



k 



a 



= -(l/ 7 -l) 2 +a 7 F(a) 



m 



, AtM! ■ y(('+7)»-»)(»-(i-tfy (9) 

By using above theorem, we prove the Theorem 3.2. 

Proof of Theorem 3.2: First, for j ^ suppAT, since a.j is 
statistically independent of P R t Al j-B. For t < k — r, the 
dimension of P R ( Al ^B is r so that m\\sLjP R ^ P ± } b)\\ 2 i s 
of chi-squared distribution of degree of freedom r. 

On the other hand, for j G suppJf, we have 



where 

F(a) := (1/a) f 
Jo 



1V I f 4ti(a) 2 , . 

Jo dX i(s) 



is an increasing function with respect to a such that 

lim a _j.o F( a ) = and a(l) = 1. 

Then we consider two limiting cases according to the number 
of measurement vectors. 

(Case 1) For t < k-r, {m\\**P R(P ^ )B) \\ 2 : j $ suppX} 
are independent chi-squared random variables of degree of 



max \\a*P R , P ± B) \\ > -\\A* S P R(P ± r)As\\f freedom r so that by Lemma 7.4, we have 

jGsuppX J v R(A It ) ' v n.(Aj t ) 1 



E crUAs) 



> 



lim max 



™\\a*P R(Pi L (Ait)B) \\ 2 



since R(P^ Ai ,B) C R(As), where As have singular values 
< (7i < (72 < ■ ■ ■ < <Tk- Then by (9), we have 



lim 

n— > 00 



i=i 



(1-7)2 



a;rfA 7 (a;) 



fc 



n^oo j^suppX 2 log (n — fc) 

Here we assume that 

m>fc 2(l + *)log(n-fc) 
r 

Then by Marcenko-Pastur theorem [10], 

lim (1 - ^fkfmf 



1. 



(14) 



(15) 



(10) lim a min (A s ) 



where < i 7 (a) < 1 is the value satisfying 



"7 

l- 7 +2 7 t 7 (a) 



ds 7 (x) 



'1-7 

If we let 



(l-7+2 7 t T (a)) 2 



> lim fl - v/r/(21og(n- fc))) 2 = 1 

n— too \ / 



(1-7) 2 



d\ 7 (x) = a. 



so that 



lim inf max 



m H^ ( , ff) B)ll 2 



dx, 



1 yT^yi 

'< 7W ~ ^ 7S + (l- 7 ) 2 
then we have for any < t < 4, 

/ dAi(x) < f d\ .- ( (x) 
Jo Jo 

then by substitution with s = (x — (1 — 7 ) 2 )/ 7 , we have 

»(l-7+27Ma)) 2 

xe?A 7 (x) 

(1-7) 2 

(l-7 + 2 7 t T (o)) 2 -(l— t) 2 



00 jesuppx 2 log (n — fc) 



(11) 



(12) 



> lim inf 



m 



rwoo 2 log (n — fc) fc 
2 



> lim inf • 



n->oo 2 log (n — fc) y ; 



> 1 + 5. 



(16) 



> 



[(l- 7 ) 2 +7s]dAo, 7 ( S ) 

/•4ti(a) 2 

/ [(l- 7 ) 2 +75]rfA!( S ) 
Jo 

(l-jfa + j sdAi(a), (13) 

Jo 



Hence, when r is a fixed number, if we have (15), then we 
can identify fc — r correct indices of suppA" with subspace 
S-OMP, in the large system limit. 

(Case 2) Similarly as in the previous case, for t < k — r, 
{m\\a.jP R ( P ± a } b)I| 2 : j £ suppX} are independent chi- 
squared distribution. Since lim„^oo (log n)/r = 0, by Lemma 
3 in [4], we have 



lim max 

n-s-oo jgsuppX 



= 1. 



(17) 



On the other hand, for j e suppX, we have [a, b] such that for any t e [a, b], 

m\\a*P R(P ± Y) \\ 2 



lim inf max 

n— >oo jGsuppX 



f(x)dx > / fi{x)dx 



/i \ 2 and satisfy that 

> --1 + *'(«)■ ( 18 > 

V7 / > and f(x) > on (a, 6). 

We let 

' " ' „ 2 Then for any nonnegative increasing function q(x) on [a, b] 

m>k(l + 6) [2-F(a)\ (19) and for any (<Zlj e fe] x fe] such that 

for some 5 > 0. Note that (19) is equivalent to 



i>(l + 5)[2-F(a)} 

7 we have 



/ h{x)dx= / /(*)(&, (21) 

•/a -/a 



Again we let 

4 + 1 



u := F(a) and v :— 



f Ql g(x)f 1 (x)dx> f g(x)f(x)dx. (22) 

•J a J a 

Proof: First, we define 



aSNR min (B)-l' 

Then for a quadratic function Q(x) = (x — l) 2 + ux, if x > 

(l + (5)(2-u), then we have F^x) = h(t)dt and F(x) =/ /(t)dfc. 

Q(x) = x 2 - (2 - u)x + 1 = x [a; - (2 - u)} + 1 _ , , ^ , , ° , ^, , . , . ° . , 

Then both and i< (x) are strictly increasing functions so 

> 6(1 + S)(2- u) +1 > 1 + 5(1 + 5) (20) that their inverse functions exist and satisfy Ff 1 ^) > 

since < u < 1. Combining (18) and (20), we have for for any xe [0, 1]. For any (q u q) e [a, 6] x [a, 6] which satisfies 

< t < k - r and j E suppX, we have ( 21 )' there is some c e I ' ^ such that = F (l) = c - 

Applying the change of variable, we have 



m\\a*P R{Pk(Ai )Y) f 

lim inf max 



>^ + S(l + 5) f g(x)h(x)dx- f g 



(x)f(x)dx 



[ C [g(Fr 1 (s))-g(F-\s))}ds>0 
Jo 



n— >oo j'GsuppX T 

for some 5 > 0. Hence, in the case of lim„_ i . 00 r/k — a > 0, 
we can identify the correct indices of suppX if we have (19). 

Lemma 7.2: For < 7 < 1 and < a < 1, we let < since ^O*) > F'^x) for any x e [0,1] and g(x) is 
t 7 (a) < 1 which satisfies increasing on [a, b]. ■ 

r (i-7+27t T (a)) 2 Proof of Lemma 7.2 Noting that we have 



/ ds 1 (x) = a 

A1-7) 2 

where <iA 7 (x) is the probability measure which is given by 



a = I d\ 0n (s) = 

Jo Jo 



dXi(s), 



I -/7Jl -\- ^y)2 _ x ^ x — (1 — 'y) 2 ) by Lemma 7.3, we only need to show that 

c?A 7 (x) '. — ;y . 

7T7 X 



/ dA , 7 (s) > / dAi(s) 
Jo Jo 



Furthermore, we let dAo, 7 (x) is the probability measure which 

is given by for any t e [ 0; 4 ]_ Let and j( x ) be given by 

Then we have 1 - xJx 

/ [(1 - l) 2 + 72^0,7(20 Then we can see that 

Jo 

/•4ti(a) 2 f(x)>fi(x) for xe (0,1-7) 

^ j [(1-l?+lx]dM(x). andA(x)>/(x) for x € [1 - 7, !]• 

Since /i(x) and /(x) are probability density functions with 

For the proof of Lemma 7.2, we need the following lemma. suppojt ^ 4] SQ that we can easily see that for any 4 g [0> 4]> 



Lemma 7.3: Let —00 < a < 6 < 00. Suppose that /i(x) 
and /(x) are continouous probability density functions on 



f(x)dx > / fi(x)dx 



so that the claim holds. 

Lemma 7.4: Suppose that r is a given number, and 
{uj"'}™ =1 is a set of i.i.d. chi-squared random variables with 
degree of freedom r. Then 

(n) 

lim max — p = 1 

n-s-oo j=i,--- ,« 2 logn 

in probability. 

Proof: Assume that Z r is a chi-squared random variable 
of degree of r, then we have 

T(r/2,x/2) 



P{Z r >x} = 



(23) 



r(r/2) 

where r(fc, z) denotes the upper incomplete Gamma function. 
Then we use the following asymptotic behavior : 

1 



P{Z r > x} 



r(r/2) 



For n — > oo, we consider the probability P(maxi<j<„ uj™^ > 
2(1 + e) logn). By using union bound, we see that 

P( max u[ n) > 2(1 + e) logn) 

1< J<?1 J 

< n-l-(2(l + 6)logn)'-/ 2 - 1 e -( 1 + £ ) 1 °^ 
T(r/2) 

" f(^2) (2(1 + e)l0Sn)r/2 " ln " e ^ 

oo. Now, considering the probability 



as n 



P(maxi<j<„ Uj l> < 2(1 — e) logn), we see that 
P( max u ( - l) < 2(1 + e) logn) 

l<j<n J 



First, if we noting that minygc err(V) > err(K), we have 
P{Ag) < P(evr(K) > i). Since N has zero mean i.i.d. 
Gaussian columns, and Pi Aj j is an orthogonal projection 
matrix with rank m — k, the random variable err(lf) = 
(l/ fT 2i)ll^fl( y i s )-^llF ^ as a chi-squared distribution with de- 
gree of freedom r(m — k) since r columns of TV are indepen- 
dent. 

Second, we consider P(A B ). We partition B by B = 
\J a a =a ,Ba where 

B(a) = {U : \U\ = k, \U n S\ = k - a}, 

a* = [afcj and a* = [(1 - e)k]. Then 

a* 

P(A B ) < 2 ^B(a)) 



where 

A B(a) = \ m i n eTT ( U ) < f f ' 

Then we need to quantify the distribution of err({7) for [/ e 
P(a). First, if we condition on the set S\U, the magnitude 
of the missed components of X is given by SNF^Xg^). 
Furthermore, for any U, A(U) := (VOll P fl(A u ) 7V llf' is a 
chi-squared random variable with r(m—k) degree of freedom 
by the independency of each column of N. Conditioned on 
SNRpf^ 17 ) = 9, the random vector (a 2 w 8)- 1 l 2 A S \ U ^ U 
has iid zero mean Gaussian random elements with variance 1, 
where xj is the j-th column of X. If we also add an another 
condition A(U) — A, then we see that 



< 



< 1- 



as n — » oo so that the claim is proved. ■ 

Appendix B: Proof of Theorem 5.1 

The proof of Theorem 5.1 basically follows the line from 
[11], which provides us the information theoretic analysis 
for partial support recovery with maximum likelihood(ML) 
estimator. Let P e (a) be the error probability conditioned on 
the true support set S with fractional distortion a. Since the 
sampling procedure is independent from S so that for any 
distribution over S, we have P e {a) = P e (a\S). Consider the 
sets 

G = {U: \U\ =k,\UDS\ > (l-a)ft}, 
B = {U: \U\ = k,\UnS\ < (l-a)fc}. 

Let err(E7) = (1/cr 2 )\\Pr (Au) Y\\ 2 f . For any t > 0, we define 
two events 

A B = {minerr(f/) < t\, A G = W : minerr(F) > tj. 
Then P e (a\S) < P(A B ) + P(A G ). 



is a non-central chi-squared random variable with non- 
centrality parameter X/6 and degree of freedom r(m — fc). 
This implies that 

P{err(f7) < t|SNR(X sw ) = 9,A(U) = A} 

= p{ x 2 NC (r(m~k),\/e)<t/e}. 

By the Lemma A. 3 in [11], since A(U) > 0, we have 

P{en(U) < t\5NR(X sxu ) = 9} 

< P{ X 2 (r(m-k))<t/6} 

using Xjvc( r ( TO — k),0) = x 2 (r(m — kj). Hence we have 

P{err(L/) < t|SNR(X sw > 9)} 

< P{x 2 (r(m-k))<t/8}. 

Then 

P{A B[a) ) < P(SNR < 0) + ]T P{ X 2 {r{m-k))<t/9}. 

UeB(a) 

By the definition of SNR and g(a/k,X), 
SNR(X s\u) 



ag(a/k,X) 



> SNR(X). 



By the definition of g(a/k, X) and SNR(A'), there is a c > 
such that 

P{ min SNR(X SW ) < l/({a)} < e~ nc ° 

UeB(a) 

where ((a) = [SNR(X)(a/k)g(a/k, X)] -1 . Hence 

P(As (a) ) < e- nC0 + ]T P{ X 2 (r(m-k))<C(a)t} 

UeB(a) 



N is i.i.d. Gaussian with covariance matrix a^I and 
E((AX + N)*(AX + iV)) = X*X + all 
for a given X, we can obtain an upper bound of I(B; Y) as 

I(B;Y) < \ log [dot (I + \ X*X)] 



+ 



k\ (n 



P{ X 2 (r(m - k)) < C(a)t}. 



Reminding P e (a) < P(A G ) + P(A B ), we first bound P(A G ). 
For arbitrary v > 0, we choose t v = (1 + u)r{m — k). Then 
by Lemma 7.5, we have 

P{ X 2 (r{m - k)) > t v } < exp (-nf?i), 

where E\ := (p — t)v 2 /4. With v arbitrary close to 0, we 
consider the probability P(A B ). To use Lemma 7.5, we need 
the condition 

SNR(X) > max — -!— — = — -!— — . 

ue[a,i-e] ug(u, X) ag[a,X) 

If this condition is satisfied, then 

P{ X 2 (r(m - k)) < r(a)t} < e ~ nE * {a) 

where 

^(a) = ^[-log(C(a))+C(a)-l]. 

Finally noting that log (*) ( n ^ fe ) -> n/i(e, a/fc) as n -> oo, 
we have 



< e 
+e 



max e 

afc<a<(l-e)fc 
nco+log k 



-n(E 2 (a)-h(e,a/k))+logk 



For n — > oo, the ML estimator is asymptotically reliable if we 
have SNR condition (7) and E2 (a) > /i(e, a/fc) for afc < a < 
(1 — e)fc which holds under the condition (8). 

Lemma 7.5: [11] For positive integer r and random variable 
Z which has the distribution x 2 ( r ) an d f° r an Y e > we have 

P{Z>(l + e)r} < e-~^\ 

P{Z<(l-e)r} < exp(-^[-log(l-e)- e ]). 
Appendix C: Proof of Theorem 5.2 

Again, the proof of Theorem 5.2 is a generalization of the 
result for the necessary condition in [11] to the MMV cases. 

We define Z = X s . Then the pair (5, Z) is equivalent to 
X. By the data processing inequality and the chain rule for 
mutual information, we have 

I(B; Y) > I(Z, S; Y) = I(S; Y) + I(Z; Y\S) (24) 

where B — AX is the noiseless measurement. Since the noise 



= £ll g(l + -LA*(X*X)) 
1=1 w 

< ^Ilog(l + -l K; (^)) (25) 



1=1 

-cri 



asymptotically since re~ cn — > for n — >• 00, where Xi(X*X) 
is the l-th largest eigenvalue of X*X for 1 < I < r. 

Then, we consider the information I(S 7 Y). Given that S 
is uniformly chosen over (™) possibilities, the asymptotic 
number of bits we need to decode S to with distortion rate 
a is given by nh(e) — nh(e,a), where we used log (?) = 
nh(k/n) + 0(logn). By Fano's inequality, P e (a) = only if 



I(S;Y) > nh(e) - nh(e,a). 
Applying (24), (25) and (26) we have 

nh(e) -nh(e,a) + I(Z;Y\S) 



(26) 



m > 



E|iog[i + ^-«iW] 
1=1 



